Consolidated Trees: An Analysis of Structural Convergence
نویسندگان
چکیده
When different subsamples of the same data set are used to induce classification trees, the structure of the built classifiers is very different. The stability of the structure of the tree is of capital importance in many domains, such as illness diagnosis, fraud detection in different fields, customer’s behaviour analysis (marketing), etc, where comprehensibility of the classifier is necessary. We have developed a methodology for building classification trees from multiple samples where the final classifier is a single decision tree (Consolidated Trees). The paper presents an analysis of the structural stability of our algorithm versus C4.5 algorithm. The classification trees generated with our algorithm, achieve smaller error rates and structurally more steady trees than C4.5 when using resampling techniques. The main focus on this paper is showing how Consolidated Trees built with different sets of subsamples tend to converge to the same tree when the number of used subsamples is increased.
منابع مشابه
Consolidated Tree Construction Algorithm: Structurally Steady Trees
This paper presents a new methodology for building decision trees or classification trees (Consolidated Trees Construction algorithm) that faces up the problem of unsteadiness appearing in the paradigm when small variations in the training set happen. As a consequence, the understanding of the made classification is not lost, making this technique different from techniques such as bagging and b...
متن کاملThe eccentric connectivity index of bucket recursive trees
If $G$ is a connected graph with vertex set $V$, then the eccentric connectivity index of $G$, $xi^c(G)$, is defined as $sum_{vin V(G)}deg(v)ecc(v)$ where $deg(v)$ is the degree of a vertex $v$ and $ecc(v)$ is its eccentricity. In this paper we show some convergence in probability and an asymptotic normality based on this index in random bucket recursive trees.
متن کاملConsolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance
This paper presents an analysis of the behaviour of Consolidated Trees, CT (classification trees induced from multiple subsamples but without loss of explaining capacity). We analyse how CT trees behave when used to solve a fraud detection problem in a car insurance company. This domain has two important characteristics: the explanation given to the classification made is critical to help inves...
متن کاملDynamics and Structural characteristics of a natural unlogged oriental beech (Fagus orientalis Lipsky) stand during a 5-years period in Shast Kalate Forest, Northern Iran
Investigation on structure and dynamics of natural forest ecosystems is an important issue for silvicultural decisions. The aim of this study is to analysis dynamics and structure of a beech stand during 5-year period in the Shast Kalateh forest in the Caspian region, North of Iran. Data were collected from a 16.9ha permanent research plot established in a natural unlogged stand in 2006. All li...
متن کاملThe Effect of the Used Resampling Technique and Number of Samples in Consolidated Trees’ Construction Algorithm
In many pattern recognition problems, the explanation of the made classification becomes as important as the good performance of the classifier related to its discriminating capacity. For this kind of problems we can use Consolidated Trees ́ Construction (CTC) algorithm which uses several subsamples to build a single tree. This paper presents a wide analysis of the behavior of CTC algorithm for ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006